NONPARAMETRIC ESTIMATION IN RANDOM COEFFICIENTS BINARY 
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Abstract. This paper considers random coefficients binary choice models. The main goal is to 
estimate the density of the random coefficients nonparametrically. This is an ill-posed inverse prob- 
lem characterized by an integral transform. A new density estimator for the random coefficients is 
developed, utilizing Fourier-Laplace series on spheres. This approach offers a clear insight on the 
identification problem. More importantly, it leads to a closed form estimator formula that yields a 
simple plug-in procedure requiring no numerical optimization. The new estimator, therefore, is easy 
to implement in empirical applications, while being flexible about the treatment of unobserved hetero- 
geneity. Extensions including treatments of non-random coefficients and models with endogeneity are 
discussed. 

1. Introduction 

Consider a binary choice model 



where I denotes the indicator function and X is a cf-vector of covariates. We assume that the first 
element of X is 1, therefore the vector X is of the form X = (1,X')'. The vector (3 is random. The 
random element (Y,X,/3) is defined on some probability space (fi,.7 r , F), and (yi,Xi, (3i),i = 1,...,N 
denote its realizations. The econometrician observes (yi,Xi),i = 1,...,N, but f3i,i = 1,...,N remain 
unobserved. The vectors X and j3 correspond to observed and unobserved heterogeneity across agents, 
respectively. Note that the first element of (3 in this formulation absorbs the usual scalar stochastic 
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shock term as well as a constant in a standard binary choice model with non-random coefficients. 
This formulation is used in Ichimura and Thompson (1998), and is convenient for the subsequent 
development in this paper. Our basic model maintains exogeneity of the covariates X: 

Assumption 1.1. j3 is independent of X , 

Section 16.31 considers ways to relax this assumption. Under (jl.ip and Assumption 11.11 the choice 
probability function is given by 



Discrete choice models with random coefficients are useful in applied research since it is often crucial 
to incorporate unobserved heterogeneity in modeling the choice behavior of individuals. There is 
a vast and active literature on this topic. Recent contributions include Briesch, Chintagunta and 
Matzkin (1996), Brownstone and Train (1999), Chesher and Santos Silva (2002), Hess, Bolduc and 
Polak (2005), Harding and Hausman (2006), Athey and Imbens (2007), Bajari, Fox and Ryan (2007) 
and Train (2003). A common approach in estimating random coefficient discrete choice models is to 
impose parametric distributional assumptions. A leading example is the mixed Logit model, which is 
discussed in details by Train (2003). If one does not impose a parametric distributional assumption, 
the distribution of (3 itself is the structural parameter of interest. The goal for the econometrician is 
then to recover it nonparametrically from the information about r(x) obtained from the data. 

Nonparametric treatments for unobserved heterogeneity distributions have been considered in 
the literature for other models. Heckman and Singer (1984) study the issue of unobserved heterogene- 
ity distributions in duration models and propose a treatment by a nonparametric maximum likelihood 
estimator (NPMLE). Elbers and Ridder (1982) also develop some identification results in such models. 
Beran and Hall (1992) and Hoderlein et al. (2007) discuss nonparametric estimation of random co- 
efficients linear regression models. Despite the tremendous importance of random coefficient discrete 
choice models, as exemplified in the above references, nonparametrics in these models is relatively 
underdeveloped. In their important paper, Ichimura and Thompson (1998) propose an NPMLE for 
the CDF of (3. They present sufficient conditions for identification and prove the consistency of the 
NPMLE. The NPMLE requires high dimensional numerical maximization and can be computationally 
intensive even for a moderate sample size. Berry and Haile (2008) explore nonparametric identification 
problems in a random coefficients multinomial choice model that often arises in empirical IO. 



(1.2) 





E p [I{x'p > 0}]. 
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This paper considers nonparametric estimation of the random coefficients distribution, using a 
novel approach that shares some similarities with standard deconvolution techniques. This allows us 
to reconsider the identifiability of the model and obtain a constructive identification result. Moreover, 
we develop a simple plug-in estimator for the density of (3 that requires no numerical optimization or 
integration. It is easy to implement in empirical applications, while being flexible about the treatment 
of unobserved heterogeneity. 

Since the scale of (3 is not identified in the binary choice model, we normalize it so that (3 is a 
vector of Euclidean norm 1 in M. d . The vector (3 then belongs to the d — 1 dimensional sphere S rf_1 . 
This is not a restriction as long as the probability that (3 is equal to is 0. Also, since only the angle 
between X and (3 matters in the binary decision I{X'(3 > 0}, we can replace X by X/||X|| without 
any loss of information. We therefore assume that X is on the sphere S d_1 as well in the subsequent 
analysis. Results from the directional data literature are thus relevant to our analysis. We aim to 
recover the joint probability density function fp of (3 with respect to the uniform spherical measure a 
over S^" 1 from the random sample (yi, xi), . . . , (i/n,xn) of (Y, X). 

The problem considered here is a linear ill-posed inverse problem. We can write 

(1.3) r(x) = [ I {x'b > 0} fp(b)da(b) = [ fp(b)da(b) := H (fp) (x) 

Jbes d - L JH(x) 

where the set H{x) is the hemisphere {b : x'b > 0}. The mapping H is called the hemispherical 
transformation. Inversion of this mapping was first studied by Funk (1916) and later by Rubin 
(1999). Groemer (1996) also discusses some of its properties. H is not injective without further 
restrictions and conditions need to be imposed to ensure identification of fp from r. Even under an 
additional condition which guarantees identification, however, the inverse of 7i is not a continuous 
mapping, making the problem ill-posed. To see this, suppose we restrict fp to be in L 2 (S <i_1 ). Since 
the kernel of TL is square integrable by compactness of the sphere, it is Hilbert-Schmidt and thus 
compact. Therefore if the inverse of TL were continuous, TL~^TL would map the closed unit ball in 
L 2 (S d_1 ) to a compact set. But the Riesz theorem states that the unit ball is relatively compact if and 
only if the vector space has finite dimension. The fact that L 2 (S rf ~ 1 ) is an infinite dimensional space 
contradicts this. Therefore the inverse of TL cannot be continuous. In order to overcome this problem, 
we use a one parameter family of regularized inverses that are continuous and converge to the inverse 
when the parameter goes to infinity. This is a common approach to ill-posed inverse problems in 
statistics (see, e.g. Carrasco et al., 2007). 
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Due to the particular form of its kernel that involves the scalar product x'b, the operator TC is an 
analogue of convolution in R d , as illustrated in a simple example in Section This analogy provides a 
clear insight into the identification issue. In particular, our problem is closely related to the so-called 
boxcar deconvolution (see, e.g. Groeneboom and Jongbloed (2003) and Johnstone and Raimondo 
(2004)), where identifiablity is often a significant problem. The connection with deconvolution is also 
useful in deriving an estimator based on a series expansion on the Fourier basis on S 1 or its extension 
to higher dimensional spheres called Fourier-Laplace series. These bases are defined via the Laplacian 
on the sphere, and they diagonalize the operator 7i on L 2 (S rf_1 ). Such techniques are used in Healy 
and Kim (1996) for nonparametric empirical Bayes estimation in the case of the sphere § 2 . The kernel 
of the integral operator Ti., however, does not satisfy the assumptions made by Healy and Kim. Unlike 
Healy and Kim (1996), we make use of so-called "condensed" harmonic expansions. The approach 
replaces a full expansion on a Fourier-Laplace basis by an expansion in terms of the projections on the 
finite dimensional eigenspaces of the Laplacian on the sphere. This is useful since an explicit expression 
of the kernel of the projector is available. It enables us to work in any dimension and does not require 
a parametrization by hyperspherical coordinates nor the actual knowledge of an orthonormal basis. 
This approach, to the best of our knowledge, appears to be new in the econometrics literature. 

The paper is organized as follows. In Section [2] we introduce a toy model and the tools from 
harmonic analysis that are used for the development of our estimation procedure and its asymptotic 
analysis. Section [3] deals with identification and presents a general estimation procedure for the random 
coefficients density. In Section 0] we study a nonparametric estimator of the choice probability function 
and derive its asymptotic properties. This is important since it yields a nonparametric estimator for 
the random coefficients density with a simple closed form, which is the main proposal of the paper. 
We derive the convergence rates of the estimators in all the L 9 spaces for q £ [l,oo] and also prove a 
pointwise CLT in Section [5j Some extensions, such as estimation of marginals, treatments of models 
with non-random coefficients, and the case with endogenous regressors are presented in Section [6j 
Simulation results are reported in Section [7J Section [8] concludes. 

2. Preliminaries 

This section introduces tools for making connections between the estimation of the density of 
(3 and a deconvolution problem, and presents some results on the hemispherical transform. 

2.1. A Toy Model. As noted above, the key insight for our estimation procedure lies in the fact the 
estimation of fa in (jl.3p is mathematically equivalent to a statistical deconvolution problem. To see 
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this, it is useful to first consider the case with d = 2. We parameterize the vectors b = (61,62) and 
x = (xi,X2)' on S 1 by their angles (ft = arccos (61) and 9 = arccos (x±) in [0, 2ir). As is often the case 
when Fourier series techniques are used, we consider spaces of complex valued functions. Let LP(S ) 
denote the Banach space of Lebesgue p-integrable functions and its norm by || • || p . In the case of 
L 2 (S 1 ), the norm is derived from the hermitian product f(9)g(9)d9. Let rg and denote r and 
fp after the reparameterization. Our task is then to obtain f^ from the knowledge of rg. Rewrite 
(|1.3p using these definitions, then divide both sides by ir, to get: 

(2.1) 1(0) = ^M(x) = (-l{\6 - (ft\ < n/2}) U{cp)dcp. 

If we further define fg := rg/ir and f v (T]) '■= ^{\v\ < 7r /2}> then using the standard notation for 
convolution, (|2,ip can be written as fg = fn* f<t>- It is now obvious that the estimation of (thus 
fp) is linked to the following statistical deconvolution problem: unobservable random variables (ft and 
r\ with densities and are related to an observable random variable 9 according to 9 = 7/ + (ft, and 
one wishes to recover fs from fg, the density of 9, when is known (and it is Uniform[— tt/2, tt/2] in 
this case)[j] 

The problem of deconvolution on the unit circle can be conveniently solved using Fourier 
series. The set of functions (exp(— mi) /\/27r) ngZ is the orthonormal basis of L 2 (S X ) used to define 
Fourier series. This system is also complete in L 1 (S 1 ). Reparameterize a function / G L 1 (S 1 ) it 
using angles as above, and denote it by ft. Denoting the Fourier coefficients of / G L 1 (§ 1 ) by 
Cn(ft) = ft{t) exp(-int)dt/ (2vr), 

(2.2) f t (9) = J2c n (f t )exp(int) 

neZ 

holds in the L 1 (S 1 ) sense. Recall also that for / and g in L 1 (S 1 ), after the same reparameterization, 

(2.3) c n (f t * g t ) = 2TTc n (f t )c n (g t ). 
Using equation (|2.3p we obtain the following proposition. 

Proposition 2.1. Co(rg) = 7rc (/«/,) and forn£Z \ {0}, c n (r e ) = c n (f^) 2 sin (mr/2) jn. 

^It is also useful to note that the inversion of TL is closely related to differentiation. Differentiating the right hand-side 
of expression (|2.f \ with respect to 6 identifies f^iO + tt/2) — f$(6 — n/2) where is defined on the line by periodicity. 
If is supported on a semicircle, with an assumption that is elaborated further in Section ^. f I ,f^> (which is positive) is 
identified. Thus if the model is identified the inverse of TL is a differential operator and as such unbounded. 
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As in classical deconvolution problems on the real line, our aim is to obtain f t (thus fa) using 
equation (|2.2p and Proposition 12.11 Proposition 12.11 shows that C2 P (r$) = holds for all non-zero p's, 
regardless of the values of C2p{f ( j ) ),p G Z\{0}. Thus from r(x) = r${6) one can only recover the Fourier 
coefficients c n (/0) for n = (which is easily seen to be l/2n, by integrating both sides of (|2.ip and 
noting that fg is a probability density function) and n = 2p + l,p £ Z. The same phenomenon occurs 
in higher dimensions, as explained in Section [2.21 

Remark 2.1. The vector spaces H 2p+1 ' 2 = span {exp(i(2p + l)t)/y/2ir, exp(— i(2p + l)i)/v / 2~7r} ,p G 
N are eigenspaces of the compact self-adjoint operator 7i on L 2 (S 1 ). These eigenspaces are associated 
with the eigenvalues 2 2p+i ■ Also, © P eN\{o} H 2p ' 2 is the null space ker Tt. 

2.2. Tools for Higher Dimensional Spheres. Let us introduce some concepts used for the treat- 
ment of the general case d > 2. We consider functions defined on the sphere which is a d — 1 
dimensional smooth submanifold in M. d . The canonical measure on § rf_1 (or the spherical measure) is 
denoted by a. It is a uniform measure on S rf_1 satisfying Ld-i da = |S d_1 |, where IS ' -1 ) signifies the 
surface area of the unit sphere. The latter is given by j S rf 1 1 = wnere F i s the usual Gamma 
function. ^(S^ 1 ) with norm || • [L is the usual space of p-integrable complex functions and L 2 (S rf 1 ) 
is equipped with the hermitian product (/,5)L 2 (s d-1 ) = J*s<*-i f( x )9( x )dcr(x). We use the following 
notation throughout the paper: 

Notation. For two sequences of positive numbers (a n ) n ^ and (6 n ) n gN, we write a n x b n when there 
exists a positive M such that M~ 1 b n < a n < Mb n for every positive n. 

Recall that the basis functions exp(±int)/\/27r are eigenfunctions of — associated with 
eigenvalue n 2 . In a similar way, the Laplacian on the sphere § d_1 , d>2, denoted by A , can be used 
to obtain an orthonormal basis for higher dimensional spheres. It can be defined by the formula 

(2.4) A s f = (A/7 

where A is the Laplacian in 1^, f the radial extension of /, that is f(x) = /(x/||x||), and f the 
restriction of / to S^ 1 . Likewise the gradient on the sphere is given by: 

(2.5) V 5 / = (V/T 
where V is the gradient in M. d . 

Definition 2.1. A surface harmonic of degree n is the restriction of a homogeneous harmonic poly- 
nomial (a homogeneous polynomial p whose Laplacian Ap is zero) of degree n in R d to S ' -1 . 



The reader is referred to Miiller (1966) and Groemer (1996) for clear and detailed expositions 
on these concepts and important results concerning spherical harmonics used in this paper. Erdelyi 
et al. (1953, vol. 2, chapter 9) provide detailed accounts focusing on special functions. Here are some 
useful results: 

Lemma 2.1. The following properties hold: 

(i) —A s is a positive self-adjoint unbounded operator on L 2 (S d_1 ), thus it has orthogonal eigenspaces 
and a basis of eigenf unctions; 

(ii) Surface harmonics of degree n are eigenf unctions of —A s for the eigenvalue Cn,d := n{n + d — 2); 
(Hi) The dimension of the vector space H n ' d of surface harmonics of degree n is 

(2n + d-2)(n + d-2)\ 



(2.6) h(n,d):= 



n\(d - 2)\{n + d - 2) 



(iv) A system formed of orthonormal bases {Y n ,i)i=i of H n,d for each degree n = 0, . . . , oo is 
complete in L 1 (S d_1 ), that is, for every f G L 1 (S ci_1 ) the following equality holds in the L 1 (S rf_1 ) 
sense: 

oo h(n,d) 
n=0 1=1 

Thus h(n, d) is the multiplicity of the eigenvalue Q n and H n,d is the corresponding eigenspace. 
Lemma 0(0), © and (liv|) give the decomposition 

neN 

The space of surface harmonics of degree is the one dimensional space spanned by 1. A series 
expansion on an orthonormal basis of surface harmonics is called a Fourier series when d = 2, a 
Laplace series when d = 3 and in the general case a Fourier-Laplace series. 

Orthonormal bases of surface harmonics usually involve parametrization by angles, such as the 
spherical coordinates when d = 3 as used by Healy and Kim (1996) or hyperspherical coordinates for 
d > 3. Instead, here we work with the decomposition of a function on the spaces H n,d as presented 
in the next definition so that we avoid specific expressions of basis functions. 

Definition 2.2. The condensed harmonic expansion of a function / in L 1 (S d_1 ) is the series ^n^=o Qn,dfi 
where Q n ^d is the projector from L 2 (S d ~ 1 ) to H n,d . 
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This leads to a simple method both in terms of theoretical developments and practical imple- 
mentations. The projector Q n ,d can be expressed as an integral operator with kernel 

h(n,d) 

(2-7) qnA X iV) = Yl Y n,l( X ) Y n,l(y)> 

1=1 

where {Xn,lfi^i is any orthonormal basis of H n ' d . The kernel has a simple expression given by the 
addition formula: 

Theorem 2.1 (Addition Formula). For every x and y G we have 

co o\ i \ b / i \ b ( .\ h(n,d)Cn \t) 
(2-8) q n .d(x,y)= q n ,d{xy), q n , d {t) : = — 

where are Gegenbauer polynomials and v{d) = (d — 2)/2. 

The Gegenbauer polynomials are defined for v > —1/2 and are orthogonal with respect to the 
weight function (1 - t 2 ) v ~ x l 2 dt on [-1,1]. Note that C%(t) = 1 and C%(t) = 2vt for v / while 
Ci(t) = 2t. Moreover, the following recursion relation holds 

(2-9) (n + 2)C» +2 (t) = 2{v + n + l)tC£ +1 (i) - {2v + n)(%(t). 

Implementation of our estimator requires evaluation of the Gegenbauer polynomials for a series of 
successive values of n. The recursion relation f|2.9[) is therefore a powerful tool. Useful results on these 
polynomials are gathered in the appendix: see also Erdelyi et al. (1953, vol. 1, p. 175-179). 

Definition 2.3. The Sobolev space W£(S d-1 ) for p G [0, oo] and s > is the space of functions / in 
L p (S d_1 ) for which the distribution 



belongs to L p (S rf_1 ). It is equipped with the norm 



ra=0 



p,s — \\J ||p 



+ 



-A s y /2 f 



p 

For the case where p = 2, that is, for the Sobolev space H s (S rf_1 ) := W^S^ -1 ), it is also 
possible to use an equivalent norm, the square of which is equal to 

oo 

^2(l+Cn,d) S \\Qn,df\\l 
n=0 
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Note that the following integration by parts holds for functions / in H 1 (S d_1 ) 
(2.10) - / f(x)A s f(x)da(x) = [ V s x f'V s x fda(x) 

and as a consequence for the second definition of the norm of H 1 (S d_1 ) we have 

11/111,1 = 11/111 + liv s /lli. 

We use these Sobolev spaces to make smoothness assumptions. 

In Section 12.11 we observed the close relationship between the random coefficient binary choice 
model and convolution for d = 2. This connection remains valid in higher dimensions. Suppose 
a function f(x,y) defined on (g) S rf_1 depends on x and y only through the spherical distance 
d(x,y) = arccos(x'y) (that is, / is a zonal function). Consider the following integral: 

h(x) = f(x,y)g(y)da(y):=f*g(x), 

then the function h is a convolution on the sphere. We now see that the choice probability function 
r(x) = Tt(f/3)(x) = /gd-i I{x'b > 0}fp(b)da(b) is a special case of h and therefore can also be regarded 
as convolution. Obtaining fg from r (or, inverting Ti) is therefore a deconvolution problem. 

In what follows we often write f(x,*) when a function / on (£> is regarded as a 
function of *. Also, the notation ||/(x,*)|| p is used for the LP norm of f(x,*), that is, ||/(x,*)|| p = 
Ld-i \ f( x , y)\ p da{y). Note that if / is a zonal function as in the above definition of spherical convolu- 
tion, its L p norm \\f(x, does not depend on x. The following Young inequalities for convolution 
on the sphere (see, for example, Kamzolov, 1983) are useful: 

Proposition 2.2 (Young inequalities). Suppose f(x,*) and g belong to L r (S d_1 ) and L p (8 d_1 ), re- 
spectively. Then h(x) = f * g(x) is well-defined in L 9 (S rf_1 ) and 

\\h\\ q <\\f\\ r \\g\\ P , 

where 1 < p, o, r < oo and - = - + = — 1. 

Let Pt denote the projection operator onto ©^ =0 H n ' d , i.e. 

T 

Prf(x) = V Q n ,df(x) = [ D T (x, y)f(y)da(y) 

where 

T 

Dr(x,y) = ^2q n ,d{x,y). 

n=0 
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The kernel Dt extends the classical Dirichlet kernel on the circle to the sphere The sum over 

T in the definition of Dt also has the simple closed form in terms of derivatives of Gegenbauer 
polynomials; see Equation (52) in Muller (1966). The linear form / — > L d _ 1 DT(x,y)f(y)da(y) 
converges to Ld-i f(y)d5 x (y) = f(x) as T goes to infinity, where 6 X denotes the Dirac measure. The 
Dirichlet kernel yields the best approximation Prf of / in L 2 (S rf_1 ) by polynomials that belong to 
©n=o H n ' d , but is known to have flaws. For example, Dt does not satisfy 

V/ G L 1 ^- 1 ) lim \\D T * / - /Hli^-i) = 0, 

that is, the sequence Dt,T = 0,1,... is not an approximate identity (see, e.g., Devroye and Gyorfi 
1985) in L 1 (S rf_1 ). Indeed, the L 1 (S rf_1 ) norm of the kernel is not uniformly bounded; more precisely, 
we have 

(2.11) PtC^IKxT^ 2 )/ 2 
when d > 3 and 

(2.12) HiM-.aOH^lQgT 

when d = 2 (as noted above, these norms do not depend on the value of x G S d_1 ). These bounds 
can be found in Gronwall (1914) for d = 3 and Ragozin (1972) and Colzani and Traveglini (1991) for 
higher dimensions. Also, Dt does not have good approximation properties in L 00 (S rf_1 ); in particular, 
we do not have 

V/ G L 00 ^" 1 ) lim \\D T * / - /Ulco^-i) = 0. 

Near the points of discontinuity of /, Dt * f has oscillations which do not decay to zero as T grows to 
infinity, known as the Gibbs oscillations. This phenomenon deteriorates as the dimension increases. 
These problems can be addressed by using kernels that involves extra smoothing instead of the Dirich- 
let kernel Dt- To this end, define a general class of kernel 

T 

(2-13) K T (x, y) = J2 x(n, T)q n4 {x, y) 

n=0 

for some sequence x(n,T). These are called smoothed projection kernels. Typically the function \ 
is chosen so that it puts more weight on lower frequencies. In particular we impose the following 
conditions: 

Assumption 2.1. (i) is uniformly bounded inT. 
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(ii) There exists constants C and a such that for all x,y, z G E> d 1 , 

\K T (z,x) - K T (z,y)\ < C\\x - y\\T a , 

where \\ ■ \\ denotes the Euclidean norm. 
(Hi) For p G [1, oo] and s > 0, there exists a constant C such that for every f in Wp(S d_1 ), 

/(•)" / K T (;y)f(y)da(y) 

(iv) x('iT) takes values in [0, 1] and is such that there exists c > such that for all < n < \T/2\, 
x(n,T) > c. 

The smoothed projection kernel KT(x,y) depends on x and y only through d(x,y), thus the 
value of the norm in Assumption Q does not depend on x G S d ~ l . Assumption (jl|) could 

be relaxed, but imposing this on Kt allows us to make relatively weak assumptions on the smoothness 
of the density of the covariates later in this paper. Assumption (jn]) is used to establish the L°°-rates 
of convergence of our estimators. Assumption (|m|) provides bounds for approximation errors. Under 
this condition, Kt * f approximates / G L^S ^ 1 ) with an error of the same order as that of the 
best re-th degree spherical harmonic approximation of a function / G L p (S d_1 ) in Wp(S d_1 ) (see e.g. 
Kamzolov 1983 and Ditzian 1998). This is useful in our treatment of the bias terms in our estimators. 
As concrete examples, the following two choices for the weight function % in (|2. 13j) satisfy Assumption 
12. 1| as shown in the appendix. The first and the second choices of x correspond to the Riesz kernel 
and the delayed means kernel, respectively. 

Proposition 2.3. In the definition of the smoothed kernel (|2.13p . let 

where I is an integer satisfying I > (d — 2)/2, or 

x(n,T) = ip(n/T), T = 2 J for some j 

where ip : [0,oo) — > [0, oo) is infinitely differentiable, nonincreasing, such that ip(x) = 1 if x < 1/2 
and ip(x) = if x > 1. Then Kt satisfies Assumption \2. 1[ 

The delayed means kernel has the nice property that it does not require prior knowledge of the regu- 
larity s in Assumption 12.11 The Dirichlet kernel satisfies (|n]) , (fin)) (for p = 2) and (|rv]) of Assumption 



<CT- S \\f\\ p , 
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12,11 Like the delayed means kernel, it achieves the optimal rate of approximation without the prior 
knowledge of s. 

The notion of the odd and even part of a function defined on the sphere is important in the 
development of our identification analysis. 

Definition 2.4. We denote the odd part and the even part of a function / by 

f-(b) = {f{b)-f{-b))/2 

and 

/+(&) = (/(&) + /(-&))/2, 

respectively, for every b in S ' -1 . 

If the function / is in L 2 (S ci_1 ) then Equations (USD and |L8|) imply that Q 2p ,df(x) = 
Q2p,df(—x) and Q2p+i,df(x) = —Q2p+i,df(—x) for p £ N. Consequently, the odd order terms in the 
condensed harmonic expansions of /, / + and /~ satisfy Q2 P +if~ = Q2p+\f and Q2 P +if + = 0. Like- 
wise, for the even order terms in the condensed harmonic expansions of these functions Q 2 pf + = Q 2 pf 
and Q 2 pf~ = hold. We conclude that the sum of the odd order terms in the condensed harmonic 
expansion corresponds to / _ and that of the even order terms to / + . As anticipated from the analysis 
of the d = 2 case, the operator Ti reduces the even part of fp to a constant |, therefore Fourier-Laplace 
series expansions for fa derived later involve only odd order terms. 

We now provide a formula that is later used to obtain our estimator for fp. If a non-negative 
function / has its support included in some hemisphere of S rf_1 then 

(2.14) /(ar)=2/-(z)l{/-(x)>0}. 

Denote the support of / by supp/ and let — supp/ = {x\ — x G supp/}, then this formula follows from 
the fact that f~(x) = f + (x) > on supp/ while f~(x) = —f + (x) < on —supp/ and both / _ and 
/ + are on \ (supp/|J —supp/). 

Remark 2.2. If / is a probability density function, the coefficient of degree in the expansion of / 
on surface harmonics is l/|S rf_1 |. Conversely, any harmonic polynomial or series such that its degree 
coefficient is l/|S d_1 | integrates to one. 

The next theorem shows that Fourier-Laplace series on the sphere is a natural tool for the 
study of the operator TC. 
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n.d 



Theorem 2.2 (Funk-Hecke Theorem). If g belongs to H n ' for some n, and a function F on (— 1, 1 
satisfies 

/ \F(t)\ 2 {l-t 2 )^' 2 dt < oo, 



then 

(2.15) / F(x'y)g(y)da(y) = \ n (F)g(x) 

where 

F{t)C p n (d) {t){l-t 2 )^dt. 

In other words, the kernel operator defined by 

/ G ^(S^ 1 ) i > U^J sdi F(x'y)f{y)da(y)) G L^S^ 1 ) 

is, in the subspace H n,d , equivalent to the multiplication by X n (F). Thus a basis of surface harmonics 
diagonalizes an integral operator if its kernel is a function of the scalar product x'y. 

Remark 2.3. Healy and Kim (1996) use Fourier-Laplace expansions to analyze a deconvolution prob- 
lem on § 2 . As we shall see below, the Addition Formula along with condensed harmonic expansions 
provide a general treatment that works for arbitrary dimensions. 

2.3. The Hemispherical Transform. The hemispherical transform 7i, defined by 7if(x) = J§d-i ^-{x'y > 
0} f(y)da(y), plays a central role in our analysis. It is a special case of the operator considered in the 
Funk-Hecke theorem above, with F(t) = l{t £ [0, 1]}, therefore the next proposition follows. 

Notation. We define A(n, d) = X n (I{t £ [0, 1]}) for d > 3 and A(n, 2) = 2sin( " 7r/2) . 

Proposition 2.4. When d > 2, the coefficients A(n, d) have the following expressions 
{%) X(0,d) = ^ 

A (i,d) = ^ 

(m) Vp £ N, A(2p,d) = 

A(2 P+M) = ;-^ i ';^;;;; . 

For the sake of completeness we give a simple proof of this result in the appendix (see also 
Groemer (1996) and Rubin (1999)). Define ^(S^" 1 ) and H^S^ 1 ) as the restrictions of L 2 (S d_1 ) 
and H s (S rf_1 ) to odd functions and similarly Lg ven (S rf_1 ) and H| ven (S rf_1 ) for even functions. The 
following corollary is a direct consequence of the Funk-Hecke Theorem and Proposition 12.41 and 
corresponds to an observation made in Remark 12.11 for the d = 2 case. 
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Corollary 2.1. The null space of the hemispherical transform 7i is given by 



ker n = tfvA = I f€ Ll^- 1 ) : f 



f(x)da(x) = 0}, 



when TL is viewed as an operator on L 2 (S d 1 ). The spaces H 0,d and H 2p+1,d for p G N are the 
eigenspaces associated with the non-zero eigenvalues ofTC. 

As a consequence of Proposition 12,41 ft is n °t injective and restrictions have to be imposed 
in order to ensure identification of fa. Section [3] presents sufficient conditions that allows us to 
reconstruct fp from f7. 

The following proposition can be found in Rubin (1999). 

Proposition 2.5. TL is a bijection from L^S 4 *" 1 ) to H^S^ 1 ). 

We can also easily check (see the proof in the appendix) that 
Proposition 2.6. For all s > 0, there exists positive constants C\ and C u such that for all f in 

°l l|/~ll 2 ,s ^ IIW~)ll2, S +d/2 ^ C « ll-Mla,* • 

The factor d/2 corresponds to the degree of "regularization" due to smoothing by TL. Now the 
inverse of an odd function /~ is given by 

oo 

(2-16) U-\r){y) = Y J w 9 | -i , / q2 P+ U^y)r{x)da{x). 

^ A(2p + l,d) J S d-i 

This is straightforward given our results at hand: for example, operate TL on the RHS to see: 
/ oo if \ 00 i 

ft 5Jt — / Q2 P +i,d( x ^y)f~( x ) da ( x ) = X, \ — ftQ2 P +if~ 

\f^ A 2p+ i Js«-i J ^ A 2p+ i 

_ 2 P + q 2 1 f- (\yy the Funk-Hecke Theorem) 

= /"■ 

If /- belongs to H^ 2 ^" 1 ), then is a well-defined L 2 ^" 1 ) function. Otherwise it 

should be understood as a distribution and is only defined in a Sobolev space with negative exponent. 
Moreover, if d is a multiple of 4, it is possible to relate the inverse of the operator TL with differentiation 
as in the case of d = 2: 



15 

Proposition 2.7. If d is a multiple of 4, 

d/4 

n- 1 = \§ d - 2 \ Y[[-& s + 2(k - i)(d - 2k)}. 

k=l 

See the appendix for the proof. This connection between the inverse of Ti and differentiation suggests 
that a Bernstein-type inequality might hold for 7i~ . Indeed, even though the above inversion for- 
mula is concerned with d's that are multiples of 4, the following Bernstein inequality holds for every 
dimension. 

Theorem 2.3 (Bernstein inequality). For every d > 2 and every q S [l,oo], there exists a positive 
constant B(d, q) such that for all P in 0p =o H 2p+1 ' d , 

(2.17) \\n- l P\\ q <B(d,q)T d / 2 \\P\\ g . 

This result is proved in the appendix. It is important for our subsequent analysis of the estimation of 
the random coefficients density. 

Rubin (1999) gives other inversion formulas for the Hemispherical transform in terms of dif- 
ferential operators. The fact that the inversion roughly corresponds to differentiation is another 
manifestation of the ill-posedness of our problem at hand. The inverse operator TL~ l is indeed un- 
bounded. We call the factor d/2 in (|2.17p the degree of ill-posedness of the inverse problem. For the 
case q = 2, there exists a lower bound for in (|2.17p of order T d l 2 as well, implying that the 

upper bound T d l 2 in the order of T obtained in Theorem 12.31 is tight. 

3. General Results 

3.1. Identification in the Random Coefficient Model. In this section we address the following 
two questions: 

(Ql) Under what conditions is fp identified? 

(Q2) Does the random coefficients model impose restrictions? 

Let us start with the question (Ql). As noted in Section [2.31 operating TC reduces the even part of a 
function to a constant 1 and therefore it is impossible to recover ft from the knowledge of r, which is 
what observations offer. Our identification strategy is therefore as follows: (Step 1) Assume conditions 
that guarantee the identification of /J; then (Step 2) Show that fp is uniquely determined from /J 
under a reasonable assumption. We first consider Step 1. Define H + = H (n) = {x G S d_1 : x'n > 0}, 
where n = (1, , 0, 0)', that is, the northern hemisphere of S d_1 . For later use, also define its southern 
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hemisphere H~ = H(—n). Since the model we consider has a constant as the first element of the 
covariate vector before normalization, the same vector after normalization is necessarily an element 
of H + . We make the following assumption, which also appears in Ichimura and Thompson (1998), 
and show that it achieves Step 1. 

Assumption 3.1. The support of X is H + . 

This assumption demands that X, the vector of non-constant covariates in the original scale, is 
supported on the whole space M ' -1 . It rules out discrete or bounded covariates; see Section [6] for a 
potential approach to deal with regressors with limited support. In what follows we assume that the 
law of X is absolutely continuous with respect to a and denote its density by fx- 

Step 1 of our identification argument is to show that the knowledge of r(x) on H + , which is 
available under Assumption 13. 1\ identifies /J . The problem at hand calls for solving r = Tifp = 
\ + TCfg for f7, and the inversion formula derived in (12. 16ft is potentially useful for the purpose. 
A direct application of the formula to r is inappropriate, however, since it requires integration of r 
on the whole sphere S d_1 , but r is defined only on H + even when X has full support on An 
appropriate extension of r{x),x G H + to the entire S d_1 is in order. Using the random coefficients 
model (jl.ip and Assumption ll.il then noting that fa is a probability density function, conclude 

(3.1) Htfp){-x) = [ ffs(b)da(b) = 1 - H(ff,)(x) = 1 - r(x) 

JH{-x) 

for x in H + . This suggests an extension R of r to S^" 1 as follows: 

(3.2) Vx G H + ,R(x) = r(x), and Vx G H',R(x) = 1 - r(-x) = 1 - R(-x). 

The function R is well-defined on the whole sphere under Assumption 13.11 Later we derive a formula 
for fp in terms of R(x),x G §> d ~ 1 , which shows the identifiablity of fp under Assumption 13.11 
Note that 

(3.3) R(x) = R + (x) + R-(x) 

= ~[R(x)+R(-x)]+RT(x) 

= -[R(x) + (l-R(x))]+R-(x) by([32D 

= \ + R-{x) 
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thus R is completely determined by its odd part and therefore, 



R(x) = - + n(fj) (x), 



or 



(3.4) 



R- = Hf. 



We can invert this equation to obtain /J. 

Now we turn to Step 2 in our identification argument. Obviously fg does not uniquely de- 
termine fg without further assumptions. This is a fundamental identification problem in our model. 
We need to identify fg from the choice probability function r, but we can choose an appropriate even 
function g so that fg + g is a legitimate density function (see the proof of Proposition 13.11 for such a 
construction). Then r = TC (fg + g), and the knowledge of r identifies fg only up to such a function g. 
Ichimura and Thompson (1998, Theorem 1) give a set of conditions that imply the identification of the 
model (jl.ip . One of their assumptions postulates that there exists c on S d_1 such that P(c'(3 > 0) = 1. 
This, in our terminology, means that: 

Assumption 3.2. The support of (3 is a subset of some hemisphere. 

As noted by Ichimura and Thompson (1998), Assumption 13.21 does not seem too stringent 
in many economic applications. It is often reasonable to assume that an element of the random 
coefficients vector, such as a price coefficient, has a known sign. If the j-th element of (3 has a known 
sign (and positive), then Assumption 13 . 21 holds with c being a unit vector with its j-th element being 1. 
This is a case in which the location of the hemisphere in Assumption 13. 2 1 is known a priori, though the 
knowledge about its location is not necessary for identification. Assumption 13.21 implies the following 
mapping from f7 to fg developed in (|2.14j) : 



This is useful because it shows that Assumption 13.21 guarantees identification if /r is identified. 
Moreover, it will be used in the next section to develop a key formula that leads to a simple and 
practical estimator for fg that is guaranteed to be non-negative. 

Remark 3.1. Assumption 13 . 21 is testable since it imposes restrictions on fg, which is identified under 
weak conditions. For example, for values of b with fg~(b) > 0, fg~(—b) < must hold. Or, it implies 
that fg integrates to 1/(2|S |) on a hemisphere H(x) for some x, and -l/(2|S d ~ 1 |) on the other 



(3.5) 




H(-x). 
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The following proposition answers question (Q2), and a proof is given in the appendix. 

Proposition 3.1. A [0,l]-valued function r is compatible with the random coefficient model (II. lj) 
with fa in L 2 (S d_1 ) and Assumption \l.l\ if and only if r is homogeneous of degree and its extension 
R according to (ET2j) belongs to H d / 2 (S d_1 ). 

The global smoothness assumption that R belongs to H rf / 2 (S rf_1 ) imposes substantial restriction 
on the property of observables, that is, the behavior of the choice probability function r. Note that 
the smoothness condition in this proposition is stated in terms of R, and even if the choice probability 
function r is sufficiently smooth on the support of X, which is H + , it is not necessarily consistent with 
the random coefficient binary choice model (jl.ip unless its extension is smooth globally on In 
particular, the Sobolev embedding of H s (S rf_1 ) into the space of continuous functions for s > (d—l)/2 
implies that if the extension R is in H rf/,2 (S rf_1 ), it has to be continuous on This, in turn, means 

that the corresponding r has to satisfy certain matching conditions at a boundary point x of H + (i.e. 
x'n = 0) and its opposite point — x. 

3.2. Nonparametric Estimation of f/g. If an appropriate estimator R~ of R~ is available, an 
application of the inversion formula (|2.16p to (|3.4p suggests the following estimator for /J: 

(3.6) f p = n- 1 (ir) 

00 i f 

= Yl \ro _l 1 j\ / q%p+i,d{-,x)R7{x)do-{x). 
^ A(2p + l,d) J S d-i 

Then use the mapping (|3.5p to define 

(3-7) / /3 (6)=2/ /3 -(6)l{/-(6)>0} 

as an estimator for fp. Proposition I2.6I implies that if f^—f^G H s (S d_1 ) then R~ — R~ G H "^" 1 ), 
a = s + | and for v G [0, s], 

( 3 - 8 ) 11/^ - fph,v X \\R~ - R~\\2,v+d/2- 

As discussed earlier, the estimation of fg is related to deconvolution in and the degree of ill— 

posedness in our model is d/2, which is indeed the rate at which the eigenvalues A(n, d),n = 2p+ l,p E 

N converges to zero as n grows, as shown in (|9.1ip . Existing results for deconvolution problems (see, 

for example, Fan, 1991 and Kim and Koo, 2000) then suggest that we should be able to estimate fn 

at the rate A^+^-i in the L 2 (S d ~ 1 ) provided that fg £ H s (S d_1 ). The relationship (fBTSj) . evaluated 

- 

at v = 0, implies that this can be achieved if we can estimate R at the rate iV 2 < T + d - 1 in the || • H2 d/2 
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norm. The latter is the usual nonparametric rate for estimation of densities on d — 1 dimensional 
smooth submanifolds of M. d (see, for example, Hendriks, 1990). 

The estimation formula given in (|3.6|) is natural and reasonable, though it typically requires 
numerical evaluation of integrals to implement it. Moreover, in practice one needs to evaluate the 
infinite sum in (|3.6p . for example, by truncating the series. This results in a general estimator that 
can be written in the following two equivalent forms 

(3.9) ~f- =-h- 1 (p Tn R-) 

Tn 1 f 

for suitably chosen Tn that goes to infinity with N. The sequence Hr x o Px N ,N = 1,2,... can be 
interpreted as regularized inverses of TC, with the spectral cut-off method often used in statistical 
inverse problems. The next section gives an example of an estimator R~ that implies a very simple 
closed form expression for fg that avoids numerical evaluation of the integrals in (|3.6p . 

4. Estimators for the Choice Probability Function 

This section considers estimation of the choice probability function r and its extension R. We 
propose an estimator for r, which, in turn, yields a computationally simple estimator for fg. Also the 
asymptotic results presented here are useful for the next section where we study the limiting properties 
of our estimator for the random coefficients density fg. 

Since R is square integrable on § d_1 , it has a condensed harmonic expansion which enables us 
to obtain the expressions in the next theorem. 

Theorem 4.1. For x in we have 

j oo 

(4.1) jR(x) = _ + ^ E 

p=0 

This suggests an estimator of the form R\ (x) = \ + R± with 

1 N (2y- — 1) Tn 

1=1 JX{Xi) p=0 

where fx is an estimator of fx and Tn is a suitably chosen sequence diverging to infinity with N. 
Note that the second summation corresponds to the Dirichlet kernel. We can generalize this, by 



(gy - 1; 

fx(X) 



q 2p+ i,d(X,x) 
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introducing a class of estimators of the form 

8=1 JX(X-i) 

where ^2T N * s ^ ne °dd P ar t of a kernel of the form ()2.13|) satisfying Assumption 12.11 such as the two 
kernels in Proposition 12.31 

The estimator fj4.2j) is convenient, though the plug- in term fx has to be treated with care. We 
avoid restrictive assumptions on the distributions of covariates and allow fx (x) to decay to zero as x 
approaches the boundary of its support H + . To deal with the latter problem, we modify (|4.2p using 
a trimming factor to define 

1 (2% — l)KZr (xi,x) 
(4.3) = 4£ 



N j = i max lfx(xi),a N 
where ajy is a sequence of the form 

(4.4) a N = log(N)- r 
for some positive r. Our estimator for R is then 

(4.5) R=± + R~. 

Remark 4.1. Alternative estimators of R~ are available. For example, one may use kernel regression 
on the sphere to estimate r in order to obtain an estimator for R~ . As noted before, however, we 
then need to use numerical integration to evaluate (|3.9p to calculate f^. 

Various nonparametric estimators for fx can be used in (|4.3p . since estimation of densities 
on compact manifolds have been studied by several authors, using histogram (Ruymgaart (1989)), 
projection estimators (see, e.g. Devroye and Gyorfi (1985) for the circle and Hendriks (1990) for 
general compact Riemannian manifolds) or kernel estimators (see, e.g. Devroye and Gyorfi (1985) for 
the case of the circle, and Hall et al. (1987) and Klemela (2000) for higher dimensional spheres). We 
now assume that the following holds for fx and its estimator fx- 

Assumption 4.1. Suppose for q and a that will be specified later 

(0 

( <T + (ri-l)(l-l/q) \ 

I N \ 2<r+d-l \ 

( (iogjv)^(i-^)ifa>2> ) J ' :z fx{x) < °° 

holds for some r > 0, and fx and fx satisfy either 
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(«) 



N 



(log AT)2r+(l-2/ (? )I{ g >2} 



2a + d-l 



(log A/") r max 
i=l,..., AT 



max(/x(xi),log(A^) _ 



m&x(fx(xi),\og(N) r 



o P (i) 



or 



(in) for some constant C , 
N 



lim 



(logjV) 2r+1 



2a+d—l 



(log NY max 
i=l,...,N 



max(f x ( Xi ),log(N)- r ) 



max (f x (xi),log(N)- 



< C a.s. 



Assumption 14.11 ([H]) (or ([m]) ) can be met easily when fx is smooth enough. In the simulation 
experiment we use 



(4.6) 



fx(x) 



max 



\ i=l 



Xj, x) , 



for a suitably chosen Tjy that depends on the sample size and the smoothness of fx and Kt n is a 
kernel of the form f)2 . 13|) satisfying Assumption 12.11 Theoretical details of this estimator will appear 
elsewhere but note that its rate of convergence in sup-norm can be obtained in the same manner 
as the proof of Theorem 15.11 This estimator is in the spirit of the projection estimators of Hendriks 
(1990), but here we are able to derive a closed form using the condensed harmonic expansions together 
with the Addition Formula. Note also that Kt n is a smoothed projection kernel (note the factor \ 
in (|2.13|) ). which is used here in order to have good approximation properties in the L 9 (S rf_1 ) norms 
with arbitrary q € [l,oo], in particular in the L°°(§ d_1 ) norm. 

We now present asymptotic properties of the estimators for R. The proofs are very similar to 
those of Theorems 15.11 and 15.21 of Section [5] given in the appendix and thus omitted. We first state 
results on the rate of convergence, including the strong uniform convergence rate. Apart from the log 
correction due to trimming of fx, the rate is comparable to the usual nonparametric rates. 



Theorem 4.2 (Convergence rates in L q (S d 1 )). Suppose Assumptions \KT[\3.1{ 

If R belongs to W^S^ 1 ) with q in [l,oo] and a positive, and T/v satisfies 



ID andCOJu]) hold. 



-N 



N 



(log AT)2r+(l-2/g)n{g>2} 



then 
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Moreover, if Assumptions {4~IT\ and \4-l\ (fiiT|) hold then there exists a constant C such that 

N 



limAr_ 



R-R 



< C a.s. 



(logiV) 2r+1 . 

Assumption I4.1| |I|) is used to achieve a rate of convergence logarithmically close to the desired 
i -, 
nonparametric rate N^+d- 1 . Relaxing it while still keeping the exponent 2a +d-i ( U P ^° a logarithmic 

term) seems difficult. 

Next we consider asymptotic normality: 

Theorem 4.3 (Asymptotic normality of R). Suppose R belongs W^ (S rf_1 ) with a positive and As- 
sumptions \2A\and\3A\ hold. If fx, fx, fx, Tn and r satisfy 

max(/x(xi),log(AO~ r ) 



(4.7) 

(4.8) 
(4.9) 
(4.10) 

then 

where 



N l l 2 T N {d - l)l2 (\ogN)\ 



max 
i=l,...,N 



max (fx(xi),log(N)- 



1 



o P (l), 



N -l/2 T (d V/tQagtyr+e _ Q ^ for gome ar bit ra ry e > 0, 



■N 

2cr + d-l 

N 1/2 T N 2 



o(l), 



N i/2 T id-m 



a ({0 < fx < (logAT r }) = o(l), sup fx(x) < 



OO, 



N2 S -^(x)(R(x)-R(x) \ AJV(0,1) 



s in( x ) 



var 



(2Y-1)K- Tn (X,x) 



max(f x (X),QogN)-r) J ' 

The lower bound for the rate of T/v implied by (14.91) is faster than the optimal rate (un- 
dersmoothing). This ensures that the approximation bias vanishes asymptotically. Condition (|4.7p 
guarantees that the effect of replacing fx with fx is also asymptotically negligible. Viewed as a 
condition on fx and fx, it becomes more stringent as the rate for Tn gets slower, but as far as 

max(/x(xi),log(AO~ r 



N 2 CT +d-i (log NY max 
i=l,...JV 



1 



o P (i) 



max [f x (xi)Aog{N) r 

holds, every Tn that satisfies the lower bound (|4.9p automatically fulfills (|4.7p . On the other hand, 
(|4.8p imposes an upper bound for the growth rate of the parameter T/v. It is a technical condition 
under which the Lyapounov condition for asymptotic normality holds. Also, we impose (|4.10p under 
which the bias due to trimming is asymptotically negligible. It becomes increasingly more restrictive 
as the growth rate for Tn rises. 
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5. A Closed Form Estimator of fp 

This section presents a computationally convenient estimator for fp, and shows that it has 
desirable asymptotic properties. It is based on an estimator for /J of the form 



£_«-■(*-)-«- [if;. 



Jv ^max(/ JC (i,),(logAr)-'-) 
Computing /J is straightforward. First, note that the estimator (|4.3p for i?~ resides in a finite 



dimensional space 

©p= o H 2p+1 > d , therefore Pt^ - = P~ holds. Consequently, unlike in ([33]) where 
a general estimator for R~ is considered, we do not need to apply any additional series truncation to 
R~ prior to the inversion of 7i. Second, the estimator requires no numerical integration. To see this, 
note the formula 

which follows from 

n T N — i 

/ q2 P +i,d(x,b)KzJ; (x,Xi)da(x) = / q 2p+ i(x,b) V] x( 2 p' + 1> 2Tjv)g2p'+i,d(2 ; ) Xi)da{x) 

= X(2p + 1, 2T N )q 2p+ i :d {b, Xi). 

Thus 



1 ^(2^-l)W- 1 (K 2Tjv (x,,-))(6) 
V 6 ) = ^2^ 



i=i 



max lfx(xi), (log iV)~ 



1 ^ V Z 2/J J-J 2^p=0 A(2p+l,d) 92p+l,dl a? ») °J 



1=1 



max fx(xi), (log iV)~ 



Using (|3.7|) and the Addition formula, we arrive at an estimator for fp with the following explicit 
form: 

(5.1) fp(b) = 2fp(b)I{fp(b)>0}, 

1 X(2 P + 1, 2T N )h(2p + l, d ) (l^ {2 Vl - l)C^Mb) 

p=0 

This is our main proposal, on which the rest of the paper focuses. 



where t(b) = -±- V ^1™^ ± V . 

|Sd " 11 " \(2 P + l,d)c£%(l) U ft max (/*(*,), (logJV)- 
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Remark 5.1. Our estimator fg requires neither numerical integration nor optimization. Recall that 
Hn, d) = ^^t?^ K<0 = {d- 2) /2 and A(2p + 1, d) = by ||, Theorem 

12.11 and Proposition l2.4t fIT|) (|iv|) . respectively, so these are trivial to calculate. As discussed in Section 
12.21 the polynomial C^i+i can be evaluated recursively using (|2.9p . Examples of the specification of \ 
are given in Proposition 



The proof of the following result is given in the appendix. 

Theorem 5.1 (Convergence rates in L 9 (S o!_1 )). Suppose Assumptions \2. ft \3.1\ [7^t B) and ^.lf ^ with 
a = s + 7j hold. If fg belongs to Wf(S d_1 ) with q in [l,oo] and s > 0, and T/v satisfies 



then 



i 

jY \ 2a+2d-l 



(log AT)2r-+(l-2/g)I{«j>2} 



(5.2) 



f/3 - f/3 




_/\T \ 2s + 2d-l 



(log AT)2r+(l-2/g)I{g>2} _ 

Moreover, if Assumptions ^J^) and \4-l\ (fm|) hold then there exists a constant C such that 

s 

/ N \ 2s + 2d-l 

(5.3) lim^ [ {logN)2r+1 ) fp ~ U „ < C a.s. 

s 

The rate iV 2 3 +2<j-i i s [ n accordance with the L rate in Healy and Kim (1996) who study 
deconvolution on S 2 for non-degenerate kernels. Kim and Koo (2000) prove that the rate in Healy 
and Kim (1996) is optimal in the minimax sense. Their statistical problem, however, involves neither 
a plug-in method nor trimming. Also, somewhat less importantly, it does not cover the case when the 
convolution kernel is given by an indicator function, which appears in our operator Ti. In a recent 
important paper, Hoderlein et al. (2007) study a linear model of the form W = X' (3 where (3 is a 
ci-vector of random coefficients. They obtain a nonparametric random coefficients density estimator 
that has the rate N~ 2s + 2d ~ 1 without the log correction j^| when fx is assumed to be bounded from 
below and thus no trimming is required. Our log correction is closely related to the speed at which 
the density fx decays to zero as x approaches the boundary of H + . Also, our result covers L q loss 
for all q 6 [1, oo]. 

The next theorem is concerned with pointwise asymptotic normality. The proof is given in the 
appendix. 



2 Note that the dimension of their estimator is d, whereas that of ours is d — 1. On the other hand, in their problem 
W is observable, and it is obviously more informative than our binary outcome Y, which causes difficulties both in 
identification and estimation. 
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Theorem 5.2 (Asymptotic normality). Suppose fa belongs to W^ C (S ) with s > 0, and Assump 
tions \2.l\ and \3.l\ hold. If fx, fx, Tx and r satisfy 



(5.4) N V2 T -(<i-i)/2 ( log 7Y) r max 



i=l,...,N 



max(/ x (xi),log(iV) r ) 



Op(l), 



max ^/x(xi),log(iV")- 

(5.5) JV r_:L / 2 r j j^~ 1)/2 (logJ\O r+e = o(l) for some arbitrary e > 0, 

2s+2d-l 

(5.6) iV 1 / 2 ^ 2 =o(l), 

(5.7) N^Ttf-^a ({0 < fx < (logiV)^}) = o(l), sup < oo, 

(5.8) ^5 S -i(6) (^(6) - ^(6)) - iV(0, 1) 

(2y-l)-H- 1 (ii' 2 " T (X,-))(b) 

holds for b such that fp(b) / 0, where s 2 N (b) := Av&v(Z N (b)), Z N (b) = max ( fx (x),(\ogN)- r ) ■ 

Note that the conditions (|5.4|) . (|5.5p . (|5 .6 j) and (|5.7p are the same as conditions (|4.7p . (|4.8p . 
(|4.9p and (|4,10p in the case of estimation of R. To see this for ()5.6[) it is enough to set a = s + |. The 
standard error sx(b) is 2 times the standard deviation of 

z (b) 1 y ^+W^Hj f (2y-i)cff 1 (^6) \ 

NU l^ 1 ! £Z A(2p + l,(i)^S?i(l) ^ (/*(*), (log JV)-'),/ 
(see equation (|5.ip ). which can be estimated by replacing fx with /x- 

6. Discussion 

6.1. Estimation of Marginals. In Section [3] we have provided an expression for the estimator of 
the full joint density of (3, from which an estimator for a marginal density can be obtained. Let 
o"fc denote the surface measure and g_ k = o"fc/|S fc | the uniform probability measure on We write 
/3 = (jf , (5 \ and wish to obtain the density of the marginal of j3 which is a vector of dimension 
d — k. Also define P and P the projectors such that (3 = P/3 and (3 = P(3 and denote by P*& d _ 1 
and P*&d_i the direct image probability measures. One possibility is to define the marginal law of (3 
as the measure P*P/3, where dPp = fpdcr. This may not be convenient, however, since the uniform 
distribution over S d_1 would have U-shaped marginals. The U-shape becomes more pronounced as 
the dimension of (3 increases. In order to obtain a flat density for the marginals of the uniform joint 
distribution on the sphere it is enough to consider densities with respect to the dominating measure 
P*£d—V Notice that sampling U uniformly on is equivalent to sampling U according to P*CL d _ 1 
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and then given U forming p ( U ) V where V is a draw from the uniform distribution g^ d _ 1 _ k on § d ~ 1 ~ k 



and p ( U 



U 



2 



Indeed given U, U/p (iTj is uniformly distributed on § rf 1 k . Thus, when 



dP*Q-d- 



g is an element of L (S ) we can write for k in {1, . . . ,d — 1}, 

(6.1) / g{b)dg: d _i{b) = / g (p Cbj u,b) da d _ 1 _ k (u) 

where M k is the k dimensional ball of radius 1. Setting g = \fp(b)l j& G for A Borel set of M k 
shows that the marginal density of f3 with respect to the dominating measure P*cr_ d _i is given by 

(6.2) f= (fj = is"- 1 ! J d i k u (p (!) uS) ^_i- fc («). 

One can use deterministic methods to compute the integral (e.g., Hesse et al. (2007) for quadrature 
methods on the sphere) or for example one may use a Monte-Carlo method, by forming 

1 M 

where Uj, j = 1, M are draws from independent uniform random variables on S d_1_fc . 

6.2. Treatment of non-random coefficients. It may be useful to develop an extension of the 
method described in the previous sections to models that have non-random coefficients, at least for 
two reasons^ First, the convergence rate of our estimator of the joint density of (3 slows down as 
the dimension d of (5 grows, which is a manifestation of the curse of dimensionality. Treating some 
coefficients as fixed parameters alleviates this problem. Second, our identification assumption in 
Section 13.11 precludes covariates with discrete or bounded support. This may not be desirable as 
many random coefficient discrete choice models in economics involve dummy variables as covariates. 
As we shall see shortly, identification is possible in a model where the coefficients on covariates with 
limited support are non-random, provided that at least one of the covariates with "large support" has 
a non-random coefficient as well. More precisely, consider the model: 

(6.4) Y t = I{J3u + 0' 2i X2i + aiZ u + a' 2 Z 2i > 0} 

where (3% G M and /?2 £ are random coefficients, whereas the coefficients a\ G M. and a 2 £ M. dz ^ 1 

are nonrandom. The covariate vector (Zi,Z' 2 y is in M rfz , though the {dz — l)-subvector Z 2 might 
have limited support: for example, it can be a vector of dummies. The covariate vector (X 2 ,Zi)' 
is assumed to be, among other things, continuously distributed. Normalizing the coefficients vector 



^Hoderlein et al. (2007) suggest a method to deal with non-random coefficients in their treatment of random coefficient 
linear regression models. 
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and the vector of covariates to be elements of the unit sphere works well for the development of our 
procedure, as we have seen in the prevous sections. The model (|6.4j) . however, is presented "in the 
original scale" to avoid confusion. 

Define /3*(Z 2 ) '■= 01 + a' 2 Z 2 . We also use the notation 



T(Zo 



(f3t(Z 2 ),a u p 2 y 
l(Z 2 ),a 1 ,P' 2 )\\ 



G S 



dx+l 



,W :-- 



(1, Zi,X' 2 
\(1,Z U X' 



G S' 



dx+l 



Then (16, 4p is equivalent to: 



Y = I{(/3 1 *(Z 2 ),ai,/3 2 )(l,Z 1 ,X 2 ) / > 0} 
= I{t(Z 2 )'W > 0} . 

This has the same form as our original model if we condition on Z 2 = z 2 . We can then apply previous 
results for identification and estimation under the following assumptions. First, suppose (Pi,f3' 2 )' and 
W are independent, instead of Assumption I l.ll Second, we impose some conditions on fw\z 2 =z 2 , the 
conditional density of W given Z 2 = z 2 . More specifically, suppose there exists a set Z 2 C M. dz ^ 1 , 
such that Assumption 13.11 holds if we replace fx and d with fw\Z 2 =z 2 an d dx + 1 for all z 2 G 2 2 - If 
Z 2 is a vector of dummies, for example, Z 2 would be a discrete set. By (|4.ip and ()2. 16|) we obtain 



(6.5) 



t(Z 2 )\Z 2 =Z2 



(*) = £ 



i 



p=0 



X(2p+l,d x + l) 



-E 



(2Y-l)q 2p+1>dx+1 (W,t) 



fw\z 2=Z2 (W) 



Zo 



Z-2 



for all z 2 G Z 2 , where the right hand side consists of observables. This determines f T (z 2 )\z 2 =z 2 - That 
is, the conditional density 

(Pt(Z 2 ), ai ,{3 2 ) 



f 



Zo 



Z-2 



3* 1 (Z 2 ),a 1 ,(3 2 )'\ 

is identified for all z 2 G Z 2 (Here and henceforth we use the notation /(-|-) to denote conditional 
densities with appropriate arguments when adding subscripts is too cumbersome). This obviously 
identifies 



(6.6) 



Zo 



Z2 



for all z 2 G Z 2 as well. If we are only interested in the joint distribution of (3 2 under a suitable 
normalization, we can stop here. The presence of the term a%Z\ in (|6.4p is unimportant so far. 

Some more work is necessary, however, if one is interested in the joint distribution of the 
coefficients on all the regressors. Notice that the distribution (|6.6p gives 



/ 



Pt(Z 2 ) 



z. 



Z2 



f 



Pi + a' 2 Z 2 



Zo 



z 2 
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from which we can, for example, get 



E 



Define a constant 



22 = 22 ) = E (ill) + E (w) a ' 222 for a11 Z2€Z2 ' 



c :- 



E 



1 



then we can identify ca.2 as far as Z2 £ Z 2 has enough variation and 



is identified as well. Let 
(6.7) 



E 



f 



COt\ 



(^,ai,a' 2 )' 



denote the joint density of all the coefficient (except for 0\, which corresponds to the conventional 
disturbance term in the original model (16. 4p . normalized by the length of fa). Then 



(/3 2 ,ai,a' 2 ) 



Idx-l 















f 












CQ2 






cai _ 





\ 



J 



In the expression on the right hand side, / ((/3 2 , 1 ) r / 1 1 /?2 1 1 ) is available from (|6.6p . and ca\ and c«2 are 
identified already, therefore the desired joint density (|6.7p is identified. Obviously (|6.7p also determines 
the joint density of (/3 2 , a±, a 2 )' under other suitable normalizations as well. 

The density (|6.5p is estimable: when Z 2 is discrete, one can use the estimator of Section [5] to 
each subsample corresponding to each value of Z 2 . If Z 2 is continuous we can estimate fw\z 2 =z2 an< ^ 



the conditional expectation by nonparametric smoothing. An estimator for the density (|6.6p can be 
then obtained numerically. 

6.3. Endogenous Regressors. Assumption 1 1 . 1 1 is violated if some of the regressors are endogenous 
in the sense that the random coefficients and the covariates are not independent. This problem can 
be solved if an appropriate vector of instruments is available. To be more specific, suppose we observe 
(Y, X, Z) generated from the following model 

(6.8) Y = + 0'X > 0} 



with 



(6.9) 



X = TZ + V 
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where V is a vector of reduced form residuals and Z is independent of (J3, V). Note that Hoderlein et 
al. (2007) utilize a linear structure of the form (16. 9j) in estimating a random coefficient linear model. 
The equations (|6.8p and (|6.9p yield 



Y = i{(jh + v'p\ +z'r'p}. 



Suppose the distribution of TZ satisfy Assumption 13.11 It is then possible to estimate the density 
of t = t/\\t\\ where r = + V'P,pj by replacing T with a consistent estimator, which is easy to 
obtain under the maintained assumptions. This yields an estimator for the joint density of /3/||r||, the 
random coefficients on the covariates under scale normalization. 



7. Numerical Examples 

The purpose of this section is to illustrate the performance of our new estimator in finite 
samples using simulated data. We consider the model of the form (jl.ip with d = 3. The covariates are 
specified to be X = (l,Xi,X 2 ) where (X 2 ,X 3 )' ~ JV((°),2-I 2 ). The coefficients vector = (/3i,/3 2 , 1)' 
is set random except for the last element. Fixing the last component constant fulfills Assumption 13.21 
for identification. Two specifications for the random elements (/3i,/3 2 ) are considered. In the first 
specification (Model 1) we let (Pi, fa)' ~ iV((°),0.3 • I 2 ). In the second (Model 2) we consider a two 
point mixture of normals 







' a 2 pa 2 




XN 


{(-:)< 




) 






pa 2 a 2 





+ (1 - X)N 





" a 2 pa 2 




((-;)• 




) 




pa 2 a 2 





where p = 0.7, a 2 = 0.3, p = 0.5 and A = 0.5. Random samples of size 500 from each of the two 
specifications are generated, then the new estimator (|5.ip is computed. It is implemented using the 
Riesz kernel with s = 2 and I = 3 (see Proposition 12. 3|) . The truncation parameter T/v is set at 3, 
and the trimming parameter r is 2. It also requires a nonparametric estimator for fx, and we use the 
projection estimator (|4.6|) based on the same Riesz kernel (i.e. s = 2, / = 3) and T/v = 10. 
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0.7 
0.6 




Figure 1. Nonparametric estimator of fp for Model 1 



0.4 




Figure 2. Nonparametric estimator of fa for Model 2 



31 



Figure 1 presents the surface plot of the true density (blue mesh) and our estimate (multi- 
colored surface) for Model 1. Our estimator (|5.1h is defined on § 2 in this case, and we performed 
an appropriate transformation to plot it as a density on 1R 2 . With the reasonable sample size, the 
location of the peak of the density, as well as its shape, are successfully recovered by our procedure. 
Next, Figure 2 shows the estimation results for Model 2. Again, our procedure works well: the 
estimated surface plot nicely captures the locations of the two peaks and their shapes of the true 
density, thereby exhibiting the underlying mixture structure. While further experimentations are 
necessary, these results seem to indicate our estimator's good performance in practical settings. 



In this paper we have considered nonparametric estimation of a random coefficients binary 
choice model. By exploiting (previously unnoticed) connections between the model and statistical 
deconvolution problems and applying results of integral transformation on the sphere, we have devel- 
oped a new estimator that is practical and possesses desirable statistical properties. It requires neither 
numerical optimization nor numerical integration, and as such its computational cost is trivial and 
local maxima and other difficulties in optimization need not be of concern. Its rate of convergence 
in the L 9 norm for all q 6 [l,oo] is derived. Our numerical example suggests that the new procedure 
works well in finite samples, consistent with its good theoretical properties. It is of great theoretical 
interest to examine rigorously whether the rate is optimal in a minimax sense, though it is a task 
we defer to subsequent investigations. With appropriate under-smoothing, the estimator is shown to 
be asymptotically normal, providing a theoretical basis for nonparametric statistical inference for the 
random coefficient distribution. 



We first summarize some results on the Gegenbauer polynomials, which are used in various 
parts of the paper. These can be found in Erdelyi et al. (1953) and Groemer (1996). The Gegenbauer 
polynomials have the following explicit representation 



where (a)o = 1 and for n in N \ {0}, (a) n = a(a + 1) • • • {a + n— 1) = T(a + n)/T(a). When v = and 
d = 2, it is related to the Chebychev polynomials of the first kind, as 



8. Conclusion 



9. Appendix 





VneN\{0}, C° n {t) = - 



T n {t) 
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and 

C°(t) = T (t) = l 

hold for 

T n (t) = cos (narccos(i)) , n G N. 

When v = \ and d = 4, C*(i) coincides with the Chebychev polynomial of the second kind U n (t), 
which is given by 

^) = sm[(n i 1)arc ;° s(t) U £ N. 

sm[arccos(i)J 

The Gegenbauer polynomials are related to each other through differentiation, that is, they satisfy 

(9.2) fc»(t) = 2vC£l(t) 
for v > and 

(9.3) jC»{t) = 2Cl_.it). 
For v / the Rodrigues formula states that 

p.4) am = m)-o - ' 2 )- +1/2 (^t|^^( 1 - (2 »" + ""' /2 - 



The following results are also used in the paper: 

cm 



(9.5) sup 

*e[-i,i] 



(9.6) V i/ > 0, Vn G N, C£(l) = 



< 1, 



n + 2v-\ 

n 



(9.7) C °(l) = 1 and Vn G N \ {0}, C°(l) = ^, 



(9.8) C-(-t) = (-irC(t) 

These orthogonal polynomials are normalized such that 

(9.9) 



In the proofs we often denote a constant that depends only on the dimension d by C, thus its 
value is determined by the context it is used. 
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Lemma 9.1. For p positive and d>2, 



dt 



ln,d 



d\§ d+1 \ b 

jgd=Tj 1n-l,d+2 



Proof. Using SIM . (I9T2I) . M, M and 



1 



h(n, d) 



2n + d-2 



<d-2)C; 



(t) 



\&- 1 \(d-2) 



(d 



r) \ r ,iy(d+2) 



(t). 



The desired result follows, since, using again (|9.6p and (|2.6p . 



/i(n-l,d + 2) 2n + d-2 



< d+2) (l) 



□ 



Proof of Proposition [2T3l First consider the Riesz kernel. follows from (2.4) in Ditzian (1998) 
and by the fact that Cesaro kernels C l h are uniformly bounded in L 1 (S rf_1 ) for I > (see, e.g. Bonami 
and Clerc 1973, p. 225). To show ((mj) we use Theorem 4.1 in Ditzian (1998), by letting P(D) = A 5 , 
A = (x,d + 1 = T(T + d— 2) + 1, a = s/2 and m = 1. Then it implies an approximation error upper 
bound CK s / 2 (f,A s , (£ Tjd + which, in turn, is bounded by CT~ s '||(-A' s ')' 5 / 2 i /'||p (see equations 

(4.2) and (4.1) therein). By the definition of the norm of the Sobolev space W*^- 1 ) (see Definition 
12. 3p the result follows. Concerning the delayed means, corresponds to the inequality (A16) of 
Hesse et al. (2007). To see (jmj) . use Proposition 15 in Hesse et al. (2007) to obtain an upper bound 
Cinf ge0 T/2 RnA \\f-g\\p- Let A = (T/24+ 1 = |(f +d-2)+l, a = s/2,m = 1,P{D) = A s in Ditzian's 
(1998) Theorem 6.1, which gives an upper bound on the best spherical harmonic approximation in 
L p (S d_1 ) to functions in Wp(S d_1 ) (see also Kamzolov, 1983), then apply equation (4.1) in Ditzian 
(1998) again to obtain the desired result. The proof of (jn]) for both Riesz and delayed means kernels 
is as follows. Write 



T 

\K T (z,x)-K T (z,y)\ <^ X < 

ra=0 



n 



T) 



Qn,d 



z'x) 



Qn,d 



z 'v) 
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ln,d( z ' x ) ~ b Qn,d( z 'y) 



zy /d 



< 



Z X 

d+1 



dt 



n,d (t)dt 



|S d 



-II 



7n-l,d+2 



|x — y| (by lemma 19.1 



<J^TT^ h (n-hd + 2)\x-y\ (by (ZBJ and (23}) 

and conclude using that %(n, T) G [0, 1] and (|9.10|) below. (JIvJ) holds by setting c to (1/2)' in the case 
of the Riesz kernel and to 1 in the case of the delayed means. □ 

The following results are useful. 



Lemma 9.2. 

(9.10) 
(9.11) 



h(n,d) X n d ~ 2 , 
\\{2p+l,d)\^p- d l\ 



Proof. Estimate (]9.10p is clearly satisfied when d = 2 and 3 since h(n, 2) = 2 and h(n, 3) 
When d > 4 we have 



2n + 1. 



h(n, d) 



-{n +{d- 2)/2)[(n + l)(n + 2) • • • (n + d- 3)], 



(d-2)! 

and the results follow. Next we turn to (|9.1ip . When d is even and p > d/2 



|A(2p + l,d)| 



where 



(2p+l)(2p + 3)-.-(2p + d-l) 
|S d ~ 2 |l - 3 • • • (rf - 1) 



and (|9. 1 1 j) follows. Sterling's double inequality (see Feller (1968) p. 50-53), that is, 

V2^n n+1/2 exp ( -n + | < n\ < v / 2^n n+1/2 exp f -n + — | 

\ 12n + 1 / \ 12n / 

implies that 



(2Pp\ 



and therefore 



(2p)! 

l-3---(2p-l)x v£ 2 - 4 ---( 2 P)- 
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Thus for p > d/2 and d odd we have 

|A(2p + l,d)| 
and (|9,lip holds for both even and odd d. 



(2p + 2)(2p + 4)---(2p + d-l) 



Proof of Proposition EH Define a(n, d) := Cn (d) (l)|S d " 2 | _1 A„ (I{t € [0, 1]}). By the Funk-Hecke 
theorem 



□ 



thus using ([9 



a(n,d) = [ C»W(t){l -t 2 )( d -W 2 dt, 
Jo 



a (n, d) = ( ~^7 (d w 2)n r T^(l - t 2 ) n+ ^' 2 dt. 



Therefore for n > 1 and d > 3, 

a(n, d) = - 



n!((d -l)/2)„ 7 dt 



-2) "(d-2) n ^ 1+(d _ 3)/2 

m— 1 v ' 



t=0 



n!((d-l)/2) n df 

since the term on the right hand-side is equal to for t = 1. To prove that the coefficients a(2p, d) 
are equal to zero for p positive it is enough to prove 



A2p+l 

dt 2 P+ l V ) 



0, Vm > 1, p > 0. 



t=o 



t=0 



The Faa di Bruno formula gives that this quantity is equal to 

( _ 1)2p+ i- fc2(2p + 1)!(m + 1} . . . (2p + i + m) _ +fc 
^ h\k 2 l {L 1 } [Zt) 

k 1 +2k 2 =2p+l 

and the result follows since k\ in the sum cannot be equal to 0. 

When n = 2p+ 1 for p £ N we obtain, again using the Faa di Bruno formula, that the derivative 
at t = is equal to 

,(2p)l 



(-l)f^: [(2p + 1 + (d - 3)/2)(2p + (d - 3)/2) • • • (p + 2 + (d - 3)/2)] . 



p! 

Together with (|9,6p . the desired result follows. For the case d = 2 we use Proposition 12,11 
Proof of Proposition 12.61 By definition we have 

oo 

\\n (/-) \\l s+d/2 = £(i + c 2p + M r +d/2 IIQ 2p + M w(r )i 

p=0 



□ 
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where according to the Funk-Hecke Theorem 



Q2p+l,dH{f ) — Q2p+l,dH y~]Q2g+i,df 

\q=0 

(OO 
^2 \(2q + 1, d)Q 2q+1)d f 
q=0 

= X(2p+l,d)Q 2p+ljd f. 

The result follows since Lemma lU implies that (1 + C 2 p+i,d) s+d/2 ^ 2 (2p + 1, d) x (1 + C,2 P +i,d) s ■ □ 

Proof of Proposition 12. 71 If we consider the case where d is even, we know from Proposition 12.41 
that 

A(2p + i >d ) = (- 1 ) P I^ 2 K2P + l)(2p + 3) . . . (d + 2p - 1). 
Thus if d is a multiple of 4, 

A (2 p+ i,d) = I 3 "" 2 ! II [-C%H-W + 2 ( fc " " 2fc )]- 

Using this and (|2.16p . 



k=l 



oo 

A?A(2p + l, 



_ A(2p + l,d) 



!2p+\,d 



oo / d/4 \ 

= J> d ~ 2 | JJ[-<2p+ M + 2(A!-l)(d-2fc)] Q 2p+M . 

p=o \fc=i y 

Recall Definition 12.31 and the proposition is proved. □ 
Proof of Theorem 12.31 We can write 

n~ x = p 1 {d)-p 2 {d) 

where P\{D) and P 2 (D) are defined for all odd function / _ by 

°° 1 f 

W = E^ J sd _ iq4 p +3 (x,y)f-(x)da(x) 

oo If 

P2(D)f- = ~E A(4p + x) 94 P+ i(x,?/)/-(x)^(x). 

Pi(D) and P 2 (D) are two unbounded operators on £> = L^ dd (S rf_1 ) with non-positive eigenvalues. We 
apply Theorem 3.2. of Ditzian (1998) to —P\{D) and —P 2 {D) choosing a = 1. Condition (1.6) of 



37 

Ditzian (1998) can be verified using Proposition 2.2 with r = 1 and p = q and the fact that for the 
Cesaro kernels C l h are uniformly bounded in L 1 (S d_1 ) for I > (see, e.g. Bonami and Clerc, 1973). 
We see, using the triangle inequality, that for all P in ©p =0 H 2p+1 ' d , 

l]H ~ lp]lq - C X\2T + l,d) miq 
< CT d \\P\\ q . 

The last inequality follows from (|9.1ip . □ 

Proof of Proposition 13.11 It is straightforward that the model (jl.ip and Assumption 11.11 imply 
that the choice probability function r given by (|1.2[) is homogeneous of degree 0. Proposition 12.51 
along with the fact that R = \ + H (jp \ with /~ E L 2 ^^ -1 ) implies that R belongs to H d / 2 (S d_1 ). 
We now turn to the proof of sufficiency. If the extension R given by (1221) belongs to H d / 2 (S d " 1 ) then 
so does R~ and Proposition 12.51 shows that there exists a unique odd function / _ in L 2 (S d ~ 1 ) such 
that 

Moreover, since < R(x) < 1 holds for every x € S d_1 , the above relationship implies that ^ > 
Hf~(x),Vx G S d_1 . But Hf~(x) > /{/-(6)>o} f ~(.b)da(b) holds for some x. Therefore we conclude 
that \ > / {/ -( 6 )> } f~( b )dcr{x) = - /{/- (6 )< } f~{ h )dcr{b), thus \f~(b)\da(b) < 1. Also, following 
the discussion in Section [2.21 ^=t\ + / _ integrates to 1. We have seen in Corollary 12.11 that for even 
function g that has as the coefficient of degree in its expansion on the surface harmonics (i.e. an 
even function that integrates to zero over the sphere), 



holds. Now consider 



then this certainly is even and integrates to zero. Using this, define 

$ := 9 + W=T\ + f ' = 2ri{r > 0} + ]sFI| i 1 ~ WWV) * °" 
Obviously f%~~ = / — . This function /2 is non-negative and integrates to one, and thus it is a proper 
probability density function (pdf). It is indeed bounded from below by jg3=ri (l — Igd-i \f (b)\da(b)) . 
As a consequence, there exists a pdf f% such that 

R = n{r p ) = \+n{f;-) 
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and for all x in H + , r(x) = TL yf^j (x). 

Proof of Theorem 14. 11 R has the following condensed harmonic expansion 

^ oo 

R ( x )= 2+E^ 2 p+M^)(^)- 
P =i 

We then write using (13. 2ft . changing variables and using (|9.8p . 



□ 



(Q2p+l,d-R)(») 



H+ 



q.2p+i,d(x, z)R{z)da{z) 

i 

q2p+i,d( x ' z ) r ( z ) dcr ( z ) + / ?2p+l,d(a;>2 ; )(l - r(-2i))dcr0) 

g 2 p+i,<i(x,z)r(2;)cZo-(2;) - / q 2p +i,d(x, z)(l - r(z))da(z) 



H- 



2Y - 1 



H 



E 



q2 P +l,d( x > z ) E r , . 

. JX{Z) 

(2Y -l)q 2p+14 (x,X) 

fx(X) 



X = z 



fx(z)da(z) 



□ 



Proof of Theorems 14.21 and 14.31 The proofs concerning the estimation of R is the same as that 
of fp below (though the latter requires a step that uses Theorem I2.3| which is not necessary for the 
former). □ 

Now we turn to the proofs of Theorems 15.11 and 15.21 For notational convenience we simply 
write 1(b) := l{fp(b) > 0} and 1(6) := I{f p (b) > 0}. Then fp = 2f~I and fp = 2f~l Define 

f/3,T = % 1r t 
7p~=H- 1 R-. 

where 



N 



(2jfe - l)K 2T (xj,x) 
N ^ max (fx{xi), (log N)~ r ) 



1 " (2 yi - l)K 2T ( 



fx fa 



We use the decomposition 



(9.12) /" -fp={ fp - fp, T - fp, T - E fp 



E 



fp,T "E fp 



E 



fp ~ fp 
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and denote the terms on the right hand side by S p (stochastic component due to plug-in) , S e (stochastic 
component of the infeasible estimator f @ t)i &t (trimming bias) and B a (approximation bias). Note 
that the same decomposition, with H operated on each term, can be used to show Theorems 14.21 and 

31 



Proof of Theorem 15. 11 Take q G [l,oo), 



(Mb) - Mb)) q Mb) + / (Mb) ~ Mb)Ydcr(b) 

I(6)=1,I(6)=1 Jl(b)=o,I(6)=l 



+ / (Mb) ~ Mb)) g da(b) + / (fp(b) - fp(b)fda(b) 

JI(6)=1,I(6)=0 Jl(b)=0,I(6)=0 

A 1 + A 2 + A 3 + A 4 . 



Obviously 



A x = (2/7(6) -2/J(6))<^(6) 

JI(6)=1,I(6)=1 



/ (2fp(b)-f p (b))Ha(b). 

Jl(b)=0Mb)=l 



and ^4 = 0. Also, 

A 2 

'I(6)=0,I(6)=1 

But given 1(b) = and 1(6) = 1, 2/7(6) > 0, fp(b) = and 2/7(6) < 0, so replacing fp with 2/7 in 
the bracket, 

A 2 < [ (2f~(b)-2f p (b)fda(b). 

Jl(b)=Q,l(b)=l 



Similarly, 

^3 

'I(6)=l,!(6)= 

and given 1(b) = 1 and 1(6) = 0, 2/7(6) > 0, fp(b) = and 2/7(6) < 0, so replacing fp with 2/7 in 
the bracket, 



/ (fp(b)-2f-(b)Yda(b). 
Jl(V)=lI(b)=0 



Az< / (2/7(6) -2/7(6))*Ax(6). 

Jl(b)=0,l(b)=l 

Overall, 

\\h-M\\<^\\fp-fp\\ 9 r 
A similar proof can be carried out replacing L'fS* -1 ) by L 00 ^ 1 ). Thus it is enough to consider 
the behavior of /7 — f7 instead of fp — fp. A noted above, the former can be decomposed into four 
terms, S p , S e , B t and B a . 
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Let the sequence of smoothing parameters satisfy: 



(9.13) 



N 



(logiV) 2 ( r +( 1 /2-l/9)n{9>2}) 



for some 7 > 0. We later show that the above form with the choice 7 = 2s +2d-\ l e& ds to the optimal 
rate of convergence for fp. 

We start with the analysis of S p . Note that for q € [1, 00] 



|Spll,= 



<B{d iq )T d J 2 



i(ly (2^-1)^(^,0 / max(/ x (^),(logjV)- r ) \ \ 
^zaaxifxixi), (log N)-^) ^ max ff x ^ (logiV)--") )) 

l_ " (2 yi -l)K- TN ( Xl ,-) I max(/ x (^),(logiV)- r ) _ \ 
N ^max(f x (xi),(logN)- r ) ^ max (f x ( Xi ), (logiV)"^ ) 



(by Theorem 



<B{d,q)T d N /2 (log ^ 



-^|X 2Tjv (x,,0| 



max 

= 1,...,JV 



max(/ x (a;i),(logJV)-'-) 



(fx( Xi ),{\ogN)- 



holds, where we have used the triangle inequality. The L^-norm on the right hand side is bounded 
from above by 



(9.14) 



1 N 

-^|if 2Tjv (^v)|-E|^ 2Tjv (X r 



+ \\¥,\K 2Tn (X,- 



First consider the term ||Ti|| g . We begin with the case of q € [1,2]. By the Holder inequality, 



E[Ii(x) 9 ]d<7(x) 



' / E rTi(x) 2 l 9//2 dcr(x) 



11 



where 
(9.15) 



1 



EfJiOr) 2 ] <-E (K 2Tn (X,x)) 



N 

C 2 

< — ||i^2Tjv(*2 5 aj)|| 2 (boundedness assumption on 



JV 

C 
N 

C 



2T K 



n=0 



(by Assumption 12.1 flivl) ) 



n=0 
27\ 



c -r Nh 2 (n,d) C v n {d \*' 2 x) 

n=0 



2d— 112/ 



(^ (d) (l)) 2 



<jyj>(n,d) (by (JO 

n=0 
CT d-l 



< 



-A 1 



(by lemma [9^2 



By the Markov inequality, 
(9.16) 



Z# /a (logi\O r ra, = P ((logiV) r iV- 1/2 T; M - 1)/2 



-jv 



providing a convergence rate for ||2i|L, g G [1, 2]. So if we can establish a similar rate for \\T\\ 



all L 9 (S d x ) convergence rates of T\ for g G (2, oo] can be interpolated between the L 2 (S rf x ) and 



2/ed-l^ 



L°°(S ) convergence rates using the following inequality: 



(9.17) 

To see this, note 



V/ e L°°(S' 



< 



2/9|| f ||l-2/q 



f 2i f ig-2||l/9 



= iia/t in 
<[ii/ 2 iiiin/r 2 iioo] 1/9 



(by Holder) 



■.m(N,r,d) 



We can thus focus on HTiH^. We cover the sphere S by 9T(iV, r, ci) geodesic balls (caps) (-Bj) t " =1 
of centers and radius R(N,r,d), that is, = {i G : ||x — X{\\ < R(N,r, d)}. As the 

notation suggests, we let the radius of the balls depend on N, r and d, as specified more precisely 
below. Note that m(N,r,d) X R(N,r, d)-^ 1 ^. 
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We now prove that for every e > positive, there exists a positive M such that 



(9.18) 



^T^ /2 (logAO r sup |Ti(x)| > M ) < e 



holds for an appropriately chosen sequence vn T oo. Write 



(9.19) 



v N T d/2 (logN) r sup >M 

< IF i |J [v N T* /2 (log NYlT^l > M/2} 
,i=l,...,m(Ar,r,d) 



+ P Ui € {1,.. . ,(K(iV,r,d)} : v N T% 2 <)agN) r sup |Ti(a;) - T^x,)! > M/2^ 



< 0T(7V,r,d) sup F[v N T* /2 (log NYlT^Xi)] > M/2 



i=i,...,<n N 



where the last inequality is obtained using Assumption 12.11 (ju]) on the kernel and letting R(N, r, d) 
(logN)- r v N 1 T N {d,2+a) M (where a is given in Assumption 12.11 ([n|) ) . Notice 



(9.20) P [v N T d J 2 '(log NYlTt^i)] > M/2 



where 



N 



\K 2 T N (xj,X lJ 



< 2 exp 



rpd— 1 

3=1 N 

If t 2 



E 



\K 2 t n (X, Xj 

rpd—l 

1 N 



> T~^' l) v~ l T~ d,2 (logN)~ r NM/2 | 



2 + 



(Bernstein inequality) 



t = T N {d - 1] v N l T N d/2 (log N)- r N M/2 



N 



OJ > var 



\K2T N (Xj,X l} 



Vj = l,...,iV, 



d-l 



A? 



K2T N (Xj,Xi) 



r d-l 



< L (using (|ZSD and $EE)). 



The bound L in the last line is obtained by noting that \K 2 t n (Xj, Xi)\ = Y^n=o x( n i 2T]y)q n;( i(Xj, 



< 



C J2 2 n=o \ h ( n , d)\ X Tff 1 , which follows from QM§, ([93]) and (|S1D]I . Here we can take w = CA^E[^ 2 T JV (^, 



then by the calculations in (|9.15p . we can write oj = CNT N 
inator of the exponent in the last inequality. 



(d-i) 



oj is the leading term in the denom- 



13 



If we take v N = (log N)- r ~ 1 / 2 N 1 / 2 T N i2d 1)/2 , then 



(9.21) -?— x{logN)M *. 

Also, use this vn and the form of TV as specified in (|9.13p in our choice of R(N, r, d) made above to 
get: 

R(N,r,d) x (logN)- r v^T~ W2+ ^M = (logiN^N'^T^M 

Thus 

(9.22) Vl(N,r,d) x R(N, r, d)"^ 1 ) = exp Q (d - 1) log N + o(log N) J = exp (Ci log A + o(log A)) 
with Ci = \{d - 1). ([S12D, (ROD]) . dEZID and ([9T2^) imply that, for a positive constants C and C 2 , 

(9.23) P [ v N T d J 2 {\ogN) r sup \T x {x)\ > M j < Cexp {(log A)(Ci - C 2 M 2 )} 

holds. For a large enough M, C\ — C2M 2 < and the right hand side of (|9.23[) converges to zero, so 
(|9.18p follows. In summary, we have just shown that 

< 2 (log JVmiU = O p ((log N y+i/2 N -V2 T W-m^ 
and with (|9.16p and (|9.17p we also conclude that 

r;f (log AOliiii, = o p ((logNy+v^N- 1 / 2 ^- 1 ^ . 

Concerning ||?2|| g , q £ [1, 00], since fx is bounded by assumption, there exists a positive C such that 

llTsll^CllHA^*!,*,)^^ 

where integration in || • ||i is with respect to argument *i and integration in || • [L is with respect to 
•k q . But ||i^2Tiv(*i,*g)|li is a constant and does not depend on * q , as previously noted. Thus 

||ll^2r w (*i ) *,)|| 1 || ff = |S <i - 1 | 1/5 [|^(*i,*,)|| 1 

and we conclude that this term is 0(1) using Assumption 12.11 on the kernel, thus 

T]f (log AOlTsH, = O ((log NfTfj 2 ) . 

For the choice made later for T/v, this term is of smaller order than the first term T^ 2 (log N) r \\T\ \\ q . 
Analogously to our treatment of ||Ti|| 9 , we can prove that when q £ [1, 2], 

\\S e L = O p ((log Ny n-^tW-^ 2 
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while for q G (2, oo] 



\s c \\ q = o p ([\ogNy +l i 2 - l im- l / 2 T% d - 1)l2 ) . 



Let us now turn to the bias term induced by trimming 

"(2Y- \)H- 1 (K-(X,-)) (b) ( f x ( X ) 



B t (b)=E 



fx(X) \m a x(fx(X),(logN)-r) 

i[2Y - 1|X = zjTT 1 (K- Tn (z, •)) (6) (.fe(z)(logiV) r - l)da(z). 



Using Theorem 12 . 31 along with Proposition 12, 21 with r = q and p = 1, where the L 9 -norm of the kernel is 
interpolated using Holder's inequality between the uniformly bounded L^-norm and the upper bound 
on the sup norm of the order of Tff 1 seen previously, we have 

\\B t \\ q < T d N /2+id - 1){1 - lM a(0 <f x < (log iV)-). 

We finally treat B & using Assumption 12.11 (jm]) with the condition that fp G Wg(S d_1 ): 

\\Ba,\\ q < CT^ S . 

We now choose Tn to balance the bounds for the approximation bias B a and the stochastic fluctuation 
S e of the infeasible estimator /« j>. This can be achieved by setting 

(logAT)^^/ 2 - 1 ^ 1 ^ 2 }^- 1 ^^- 1 )/ 2 ~ T^ s . 

Solve this to obtain (|9,13|) with 7 = 1/ (2s + 2d—l). For this choice of T\r both terms are of the order 

\ -«/(2s+2d-l) 



N \(\ og Ny(r+(l/2-l/q)I{q>2}) 

which is the desired rate of convergence. It is easy to check that Assumption 14.11 implies that 



V N (log N) r T d/2 max 



i=l,...,N 



max (fx(xi), (log N)- r ) 



max (fx( x i)i (l°g N Y 



1 



o P (i) 



V N T% d/2 - l - {d - 1)/q o(<d <f x < (logN)- r ) = 0(1). 

This proves the L q convergence result. 

In order to prove the strong uniform consistency, noticing that the bias terms B^ and B a are 
not stochastic and bounded after proper scaling, we just have to focus on S p and S e . Concerning S p , 
proceed as before and note that taking M large enough so that C\ — C2M 2 < — 1 implies summability 



15 



of the left hand side in (|9.23[) . We conclude from the first Borel-Cantelli lemma that the probability 
that the events occur infinitely often is zero thus with probability one 

hW^co^ 1 ^, oo)T^ /2 (log N) r sup \T!(x)\<M. 

The term T2 is non-stochastic and its treatment in our previous analysis remains valid, therefore we 
can use the same non-stochastic upper bound. We then use Assumption ^. II ([m|) instead of Assumption 
14.11 (juj) to show that almost sure uniform boundedness of S p after proper rescaling. The treatment of 
S e is analogous to that of T\ . □ 



Proof of Theorem 15.21 We first prove that the Lyapounov condition holds: there exists 5 > such 
that for N going to infinity, 



E 



(9.24) 



\Z N (b)-E[Z_ 



N 



2+5 







A 5 / 2 (var(Zjv(6))) 1+<5/2 

(see, e.g. Billingsley, 1995). We start from deriving a lower bound on var (Zjy(b)). Since E[Ztv( 
converges to /J (6) , it is enough to obtain a lower bound on 

E[Z% A ](b) 

'T N -1 



-4/ ( V X (2p+1,2T N ) 
Jh+ \ „_ n 



p=0 



max(f x (z),(logN)-r) \(2p + l,d) 



fx{z)da(z) 



4/ ( V) X(2p+1,2T N ) 
Jh+ \ P =o 



1 



g2p+i,d(z,b) \ 
\(2p + l,d) \f x (z) 



Mfx > (logN)- r } + f x (z)(logN) 2r l{f x < (logiV)- r } da(z) 



> 



4 FfV/ (E X(2p+1,2T N ) 

\\jx\\oo Jh+ \ ^ 



g2 P +i,d(g, b) 
X(2p + l,d) 



da(z) 



-4- 



/T N -1 



||/x||oo J{0<f x <(logN)-r} 



£ x(2p+ 1,27V) 



llxxlloo prj Jh+ A(2p+ l,dy 



p=0 



/T N -1 



g2p+l,d(z,6) 

A(2p + l,d) 



da(z) 



\\fx\\oo J{0<fx<(logN)-r} 



Using (|9.5p and Lemma 19.21 we see that there exists a constant C such that 



T N -1 



]T X (2p+ 1,22V) 



Q2 P +i,d(z,*) 



p=0 



A(2p + l,d) 
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therefore using Proposition 12.21 we obtain 

EK 1 ](6)>-^ T £ 1 X (2 P +1,2T W ) 2 / q ^'^% da{z) - CT^a (0 < f x < (log A)-) . 



Using Assumption 12.11 ([rvjl . the first term on the right hand side can be bounded from below by 

L(Tjv-1)/2J 2 
92p+l,d(*, b) 



p=0 



A(2p + l,d) 



i.e. by CT N . Thus as a (0 < /x < (log A'') r ) decays fast enough to zero under the assumption of 
the theorem (here it is enough to have a (0 < fx < (log N)~ r ) = 0(T^ d+1 )), 



(9.25) 



E[Z*](b) > CT 



2d-l 
N 



We now derive an upper bound of E \Z^{b)\ 2+& using Theorem 12. 31 and interpolation between 
L°°(S rf_1 ) and L 1 (S d ~ 1 ) norms of the kernels using the Holder inequality: 



E 



\Zn, 



1 2+5 



< ||/x||oo(logAr( 2 + 5 ) l^- 1 (k- Tn (z,-)) 



2+6 
2+5 



< [|/x||oo(log NY^B(d, 2 + 8)Wt*P+S)/* 



-N 



2+6 
2+5 



< C(bg N) r(2+S) T d{2+6)/2 T (d-l)(l+S) 



By this and (|9.25p an upper bound for the ratio appearing in (|9.24p is given by 

/ S/2 

(logAT)K^) jlj-] . 

Therefore the Lyapounov condition is satisfied if (|5.5p holds, and it follows that N^^ 2 s^(b)S e — ► 
A(0,1). 

We now need to prove that the remaining terms S p , B t and B a , multiplied by A 1//2 s^ r 1 , are 
o p (l). The term S p is treated in a similar manner as in the proof of Theorem 15.11 



^l^ 1 (K- TN (x h .)) (b) 



^maxC/xte), Gog #)-»•) I j= i, 



max 

,,iV 



max(/x(xi), (log A) r ) 



max /^(si), (log AT)- 



Using the Markov inequality, the empirical average in the parenthesis is of the stochastic order of 



(logAf H- l (K- Tn (*,.) 
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But 



(log N)' 



K 1 I K 2T N I*' 



<B(d,l)T d N /2 (logNY K- T (*,.) 



<B(d,l)T d/2 (logNY H^MIIi 



where the first inequality follows from Theorem 12,31 and the second is obtained using the defini- 
tion of the odd part and the triangle inequality. Note that the term \\K2T N (*, ")IIi i n the last line 
does not depend on • and is uniformly bounded. By the lower bound (|9.25p it is enough to show 



N l / 2 B{d,l)T N (d 1/2) |5 P (6)| = o p (l). From the inequality above, 



NV*B{d,l)T^ 1/2) \Sr,{b)\ < (iVV^r-^/ 2 (log N) r ) max 

" i=l,...,N 



max(/ x (xi), (logiV) r ) 



max (fx(xi), (log NY 



Its right hand side is of o p (l) if 

max (f x ( Xi ), (log N)- r ) 



max 

i=l,..,,N 



max (fx(xi), (log NY 



1 



o p ( N' 



-i/2 T (d-m {logNy 



which is met under ([57 

Let us now consider the bias term induced by the trimming procedure. In the proof of Theorem 
15.11 we have obtained an upper bound for ||-Bt||oo and we deduce that 

iV V2 T -(^V2) | | jBt|U = ofl) 



when condition (|5,7p is satisfied. Finally, N 1 / 2 T N ^ d ||-B a ||oo = "(1) if condition (|5.6j) is satisfied. 
We conclude that the asymptotic normality holds for b such that fp(b) > 0. The factor 4 in the 
variance comes from the fact that fp = 2/JI. □ 
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